3000 points are sampled randomly and uniformly from the surface of a torus. We wish to make a 2 dimensional representation of these points such that local structure is preserved, i.e. points that are close together should stay together. We colour the points to give a sense of where they belong.
You must enable Javascript to view this page properly.
The Barnes-Hut t-SNE algorithm attempts to fit a model with fewer dimensions, preserving local structure. The lower-dimensional model looks a lot like an elastic torus was split open and flattened out, kind of like a popped torus-shaped balloon. Colouring has been changed to make it clear which plots are model output and which are sample data
You must enable Javascript to view this page properly.
A fourth dimension is added which has a linear relationship with another dimension (v3 on the plot). We simply add the colour mapping as a fourth dimension, and see if we recover the 3 dimensional structure with the colour mapping in-tact. The columns of the 4 x 3000 matrix have been scaled.
You must enable Javascript to view this page properly.
By setting the perplexity hyperparameter very high (400) we haven’t allowed the points to cluster too greatly, which has preserved the global structure of this embedding quite well. Recall that the matrix columns were scaled. If the matrix hadn’t been scaled, the dimension represented by colour would have dominated under Kullback-Leibler as we will see below,
You must enable Javascript to view this page properly.
This doesn’t tell us much about v1, v2, or v3. This is because v4 is on the scale [0, 3000], while v1, v2 are on the scale [-4, 4] and v3 is on [-1, 1]. This shows the importance of thinking carefully about the topological space the data occupy and how we measure variables.
Why do we need a fancy Barnes-Hut t-SNE algorithm to do what other dimension reducing algorithms such as PCA or MDS have been doing for years? We attempt the same fit using classical MDS for comparison -
You must enable Javascript to view this page properly.
This is an excellent representation of both global and local structure.
To increase the level of difficulty, all linear relationships are removed. Now the fourth dimension has a non-linear relationship with v3 and no correlation with v2 or v1.
You must enable Javascript to view this page properly.
You must enable Javascript to view this page properly.
How well does MDS capture this non-linear relationship?
You must enable Javascript to view this page properly.
In an odd way it does represent global structure quite well, but not local structure. For example, the ends (in black) have been twisted back around to each other.
Add dimension v5 which has a non-linear relationship to v1, v2 & v3, and a linear relationship with v4.
You must enable Javascript to view this page properly.
Add dimension v6 which gets weird fast.
A$v6 <- (A$v2)^2 + sqrt((A$v5-A$v3)^2)
with(A, plot3d(v2, v6, v5, type = "s", size = 1, col = cr[v4]))
with(A, plot3d(v3, v5, v6, type = "s", size = 1, col = cr[v4]))
You must enable Javascript to view this page properly.
We’re up to 6 dimensions, so feasibly we could explore them by looking at every combination (here’s another representation by way of example)
You must enable Javascript to view this page properly.
But for every dimension we add, we have d choose 3 possible spatial representations. If we have 10 dimensions, that’s 120 relationships that we can visualise (and if, like I did, you think using colour as a way to represent a fourth dimension will always reduce the solution set, consider that 10 choose 4 = 210…). Of course, some of these dimensions may have no relationships, but in practice, how do we know?
Setting perplexity = 750, the t-SNE algorithm will take a long time to compute the embeddings, but the result is superb. If you take some time to explore all of the plots above, you will see that those shapes are embedded in the dimension-reduced t-SNE plot.
You must enable Javascript to view this page properly.
Not wishing to disappoint fans of MDS -
You must enable Javascript to view this page properly.
Some of the relationships have been captured astonishingly well.